Nao Mimoto - Dept. of Statistics : The University of Akron
TS Class Web Page – R resource page
Suppose \(Y_t\) is the observation. The Model says: \[ Y_t \hspace{3mm} = \hspace{3mm} m_t + S_t + X_t \]
\[\begin{align*} m_t &: \mbox{ Trend Component (linear, random)}\\ S_t &: \mbox{ Seasonal Component }\\ X_t &: \mbox{ Stationary Time Series (ARMA?)} \end{align*}\]
where \(EX_t=0\).
Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]
Condition on Seasonal component: \(S_{t+s}=S_t\) and \(\sum_{j=1}^s S_j = 0\).
Accidental deaths in USA., 1973 to 1978 from Brockwell
acf1 <- acf
library(TSA)
acf <- acf1 # Load TSA package
D <- read.csv("https://nmimoto.github.io/datasets/acci.csv", header=T)
Acci <- ts(D, start=c(1973,1), freq=12) #- Turn D into ts object with frequency
plot(Acci, type='o', ylab="num of accidents")## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927
## 1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680
## 1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8433 8160 8034
## 1976 7717 7461 7776 7925 8634 8945 10078 9179 8037 8488 7874 8647
## 1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796
## 1978 7836 6892 7791 8129 9115 9434 10484 9827 9110 9070 8633 9240
#--- plot with month ---
plot(Acci, type="l", ylab="num of accidents")
points(y=Acci, x=time(Acci), pch=as.vector(season(Acci))) Oil Filter Sales Data (Cryer p7) inside TSA package.
data(oilfilters) # load data inside TSA package
Oil <- oilfilters
is.ts(Oil) # is data in TS format?## [1] TRUE
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1983 2385 3302 3958 3302 2441 3107
## 1984 5862 4536 4625 4492 4486 4005 3744 2546 1954 2285 1778 3222
## 1985 5472 5310 1965 3791 3622 3726 3370 2535 1572 2146 2249 1721
## 1986 5357 5811 2436 4608 2871 3349 2909 2324 1603 2148 2245 1586
## 1987 5332 5787 2886 5475 3843 2537
## [1] 12
#--- plot with month ---
plot(Oil, type="l",ylab="Sales")
points(y=Oil, x=time(Oil), pch=as.vector(season(Oil)))\(s\) is the seasonality frequency. (e.g. \(s=12\) for monthly seasonality).
Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]
Seasonality is not a trend. It’s a temporary deviation from overall trend.
\[
\sum_{j=1}^s S_j = 0.
\]
There are three poplular ways of removing seasonality:
MA filter
Seasonal Average
Seaonal Differencing
If \(s\) is odd, let it be \(2q+1\). Then use linear MA filter. \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{(2q+1)} \sum_{i=-q}^q Y_{t-j} \] If \(s\) is even, let it be \(2q\). Then use \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{2q} \Big(.5 Y_{t-q} + \sum_{i=-q-1}^{q-1} Y_{t-j} + .5 Y_{t+q}\Big) \]
Because seasonality should sum up to zero for each seasonal cycle, we have estimated trend: \[
Y_t = m_t + S_t + X_t
\]
\[
\hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{(2q+1)} \sum_{i=-q}^q m_{t-j}
+ \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q S_{t-j} }_{0}
+ \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q X_{t-j} }_{\mbox{ small }}.
\] Then estimate for (\(S_t\) + \(Y_t\)) is \[
w_k \hspace{3mm} = \hspace{3mm} X_{t} - \hat m_{t} \hspace{10mm} q < t < n-q.
\]
Use this to estimate the seasonal part, \[ \hat S_k \hspace{3mm} = \hspace{3mm} w_k - \frac{1}{s} \sum_{i=1}^s w_i. \] Now we have deseasonalized data (\(m_t\) + \(X_t\)) \[ d_t \hspace{3mm} = \hspace{3mm} Y_t - \hat S_t \hspace{10mm} t=1 \cdots n \] Now go back and re-estimate trend using \(d_t\).
t <- 1:100
Y <- -3 +.07*t + arima.sim(n = 100, list(ma = c(.4, .2) ) )
m <- filter(Y, rep(1/9, 9))
res <- Y-m
plot(Y, type="o")
lines(m, lwd=2, col="red") #-- Apply MA filter to Acci data
m.hat <- filter(Acci, c(.5, rep(1,10), .5)/12)
plot(Acci, type="o")
lines(m.hat, col="red")Filter reveals quadtaric trend
#--- Take Monthly Averages
Mav1 <- aggregate(c(Acci), list(month=cycle(Acci)), mean)$x #- 1yr long Mtly Av
M.av1 <- ts(Mav1[cycle(Acci)], start=start(Acci), freq=frequency(Acci)) #- Mtly Av
Acci.DS <- Acci-M.av1 #- Subtract from original
plot(M.av1, type="o") # Monthly Average## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.885 0.397
Stationary?
## Series: Acci.DS
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.4056
## s.e. 0.1082
##
## sigma^2 estimated as 66457: log likelihood=-494.53
## AIC=993.07 AICc=993.24 BIC=997.59
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.755 0.53 0.351 0.557 0.632 0.768 255.614
## Series: Acci.DS
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.9471 -0.3656
## s.e. 0.0451 0.1205
##
## sigma^2 estimated as 66208: log likelihood=-501.55
## AIC=1009.1 AICc=1009.46 BIC=1015.93
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.597 0.31 0.231 0.823 0.799 0.926 255.441
After we subtracted the seasonality (Monthly average \(S_t\)), We either have ARIMA(0,1,1) or ARIMA(2,0,0).
\[\begin{align*} Y_t &= \hspace{3mm} S_t + X_t \\\\ X_t &= \hspace{3mm} \phi_1 X_{t-1} + \phi_2 X_{t-2} + e_t \hspace{10mm} \mbox{(if ARIMA(2,0,0))} \end{align*}\]
(Box-Jenkins Method)
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
## Series: diff(diff(Acci, 12))
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean
##
## Coefficients:
## ma1 sma1 mean
## -0.4963 -0.6146 21.0220
## s.e. 0.1351 0.1932 11.9921
##
## sigma^2 estimated as 98283: log likelihood=-424.27
## AIC=856.53 AICc=857.27 BIC=864.84
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.588 0.457 0.453 0.385 0.497 0.355 307.622
We took \(\bigtriangledown_{12}\), then \(\bigtriangledown\), then fit MA(1). Therefore our model is \[ \bigtriangledown \bigtriangledown_{12} Y_t \hspace{3mm} = \hspace{3mm} X_t \\ X_t \hspace{3mm} = \hspace{3mm} e_t + \theta_1 e_{t-1} \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace{3mm} \mbox{ model } \\ \\ \mbox{ sARIMA}(0,1,1) (0,1,0)_{12} \hspace{3mm} \mbox{ model } \]
#--- Take Monthly Averages
D0 <- Oil
Mav1 <- aggregate(c(D0), list(month=cycle(D0)), mean)$x #- 1yr long Mtly Av
M.av <- ts(Mav1[cycle(D0)], start=start(D0), freq=frequency(D0))
Ds.Oil <- D0-M.av #- Subtract from original
plot(M.av, type="o")## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.025 0.039 0.01
## Series: Ds.Oil
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.8626
## s.e. 0.0696
##
## sigma^2 estimated as 378095: log likelihood=-368.67
## AIC=741.35 AICc=741.62 BIC=745.05
## Series: Ds.Oil
## ARIMA(0,0,0) with zero mean
##
## sigma^2 estimated as 379437: log likelihood=-376.42
## AIC=754.85 AICc=754.93 BIC=756.72
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.518 0.525 0.241 0.818 0.804 0.173 622.503
After we subtracted the seasonality (Monthly average \(S_t\)), it was WN.
\[\begin{align*} Y_t &= \hspace{3mm} S_t + X_t \\\\ X_t &= \hspace{3mm} e_t \end{align*}\]
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.07 0.01 0.01
## Series: diff(Oil, 12)
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## -217.3889
## s.e. 132.7486
##
## sigma^2 estimated as 652522: log likelihood=-291.57
## AIC=587.14 AICc=587.5 BIC=590.31
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.71 0.645 0.502 0.967 0.993 0.055 807.789
We took \(\bigtriangledown_{12}\), then it looks like a WN. Therefore our model is \[ \bigtriangledown_{12} Y_t \hspace{3mm} = \hspace{3mm} X_t \\ \\ X_t \hspace{3mm} = \hspace{3mm} e_t \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace{3mm} \mbox{ model } \\ \\ \mbox{ sARIMA}(0,0,0) (0,1,0)_{12} \hspace{3mm} \mbox{ model } \]